Auditory Processing of Speech : The COG Effect

نویسندگان

  • Daniel E. Hack
  • Ashok Krishnamurthy
چکیده

The “COG effect” is an auditory phenomenon in which a human listener can perceive the spectral center of gravity (COG), or centroid, of a frequency band up to 3.5 Bark wide. Engineering applications of this effect include extracting centroid features for use in speech recognition algorithms. This work investigates the representation of the spectral centroid in the auditory system using a computational model of the auditory pathway. Specifically, a model of the representation of sound in the primary auditory cortex (the termination of the auditory pathway in the temporal lobe of the cortex) is used to derive a spectral centroid measure. This measure is then used to predict the results of two COG effect listening experiments, the first a vowel matching task and the second a pitch matching task. By demonstrating that the properties of a subset of the cortical representation of sound match those of the COG percept, this work concludes that cortical processing and the resulting cortical representation may represent the mechanism underlying the COG effect. Introduction and Objectives The human auditory system is superior to machines in nearly all speech related listening tasks such as speech recognition and speaker identification. In light of this fact, researchers have looked to the auditory system for insight in how to improve their algorithms. This trend has led to the incorporation of principles of auditory processing into state-of-the-art speech processing algorithms. For example, melfrequency cepstral coefficient (MFCC) feature vectors are routinely used to encode the spectral features of speech signals as the first stage in speech recognition and speaker identification algorithms. The MFCC is based on two principles of auditory processing: first, the frequency analysis of the cochlea (organ of the human inner ear) may be simulated as a bank of bandpass “auditory filters,” the output of which is typically called the “auditory spectrum;” and second, the auditory spectrum is subject to a second layer of processing in which properties of the spectral profile are extracted. Accordingly, the MFCC is calculated by passing the power spectrum of an input signal through an auditory filterbank (generate auditory spectrum), then applying logarithmic compression and a cosine transform (extract features of the auditory spectral profile). Thus, by imitating the processing of the auditory system, the MFCC has enabled increased performance in speech recognition and speaker identification algorithms. This motivates the study of auditory processing itself, specifically its speech processing properties, as a means to inspire novel speech processing algorithms. One such avenue of research investigates spectral integration: how the auditory system combines information across frequency. A speech waveform is a broadband signal, containing frequency content from roughly 100 to 6000 Hz. The spectral profile of a speech signal typically contains several prominent peaks, called formants, which correspond to the resonant frequencies of the vocal tract at the time of production. The formant frequencies are labeled F1, F2, F3, etc., in order of increasing frequency. The idea of spectral integration in speech perception research has been formalized into the “center of gravity (COG) effect,” which states that two closely spaced vowel formants are effectively “merged” into a single spectral prominence whose COG (mean frequency) determines the phonetic quality of the vowel. “Phonetic quality” refers to the properties of the vowel which influence the listener’s determination of vowel identity, i.e., the phonemic decision. In other words, in vowels with closely spaced formants, the COG of the two formants is a salient cue which plays a significant role in vowel, and thus speech, recognition. Independently, the engineering literature has recently incorporated a similar idea into speech recognition algorithms. A line of research led by Kuldip Paliwal (Paliwal 1998, Chen et. al. 2004) is investigating the

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Auditory processing skills in brainstem level of autistic children: A Review Study

Aims: Autism is a pervasive developmental disorder. Deficit in sensory functions is one of the characteristics of people with autism, and usually these people show abnormality in processing and correct interpretation of auditory information. Also people with Autism show problems in communicating with others. This review article deals with the accurate understanding of Auditory processing skills...

متن کامل

Comparative Effect of Visual and Auditory Teaching Techniques on Retention of Word Stress patterns: A Case Study of English as a Foreign Language Curriculum in Iran

This study aimed at investigating the effect of visual (Cuisenaire Rods) and auditory nonsensical monosyllables using Pratt speech processing software as teaching techniques on retention of word stress. To this end, 60 high school participants made the two experimental groups of the study each having 30 students on the basis of their proficiency scores on KET (Key English Test). In one experime...

متن کامل

Auditory Temporal Processing Abilities in Early Azari-Persian Bilinguals

Introduction: Auditory temporal resolution and auditory temporal ordering are two major components of the auditory temporal processing abilities that contribute to speech perception and language development. Auditory temporal resolution and auditory temporal ordering can be evaluated by gap-in-noise (GIN) and pitch-pattern-sequence (PPS) tests, respectively. In this survey, the effect of biling...

متن کامل

Effects of ageing on speed and temporal resolution of speech stimuli in older adults

 Background: According to previous studies, most of the speech recognition disorders in older adults are the results of deficits in audibility and auditory temporal resolution. In this paper, the effect of ageing on timecompressed speech and auditory temporal resolution by word recognition in continuous and interrupted noise was studied. Methods: A time-compressed speech test (TCST) w...

متن کامل

Effect of Vowel Auditory Training on the Speech-In-Noise Perception among Older Adults with Normal Hearing

Introduction: Aging reduces the ability to understand speech in noise. Hearing rehabilitation is one of the ways to help older people communicate effectively. This study aimed to investigate the effect of vowel auditory training on the improvement of speech-in-noise (SIN) perception among elderly listeners.   Materials and Methods: This study was conducted on 36 elderly ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005